Exploring global diverse attention via pairwise temporal relation for video summarization
نویسندگان
چکیده
Abstract Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose efficient convolutional network architecture for SUMmarization via Global Diverse Attention called SUM-GDA, adapts attention mechanism in a global perspective consider pairwise temporal relations frames. Particularly, GDA module has two advantages: (1) it models within paired as well among all pairs, thus capturing across one video; (2) reflects importance each frame whole video, leading diverse on these Thus, SUM-GDA beneficial generating form satisfactory summary. Extensive experiments three data sets, i.e., SumMe, TVSum, VTW, have demonstrated that its extension outperform other competing state-of-the-art methods with remarkable improvements. addition, proposed can be run parallel significantly less computational costs, helps deployment highly demanding applications.
منابع مشابه
Automatic video summarization driven by a spatio-temporal attention model
According to the literature, automatic video summarization techniques can be classified in two parts, following the output nature: “video skims”, which are generated using portions of the original video and “key-frame sets”, which correspond to the images, selected from the original video, having a significant semantic content. The difference between these two categories is reduced when we cons...
متن کاملDiverse Sequential Subset Selection for Supervised Video Summarization
Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling useful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so ...
متن کاملVideo Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling
Human vision system actively seeks salient regions and movements in video sequences to reduce the search effort. Modeling computational visual saliency map provides important information for semantic understanding in many real world applications. In this paper, we propose a novel video saliency detection model for detecting the attended regions that correspond to both interesting objects and do...
متن کاملPerception-oriented video saliency detection via spatio-temporal attention analysis
Human visual system actively seeks salient regions and movements in video sequences to reduce the search effort. Computational visual saliency detection model provides important information for semantic understanding in many real world applications. In this paper, we propose a novel perception-oriented video saliency detection model to detect the attended regions for both interesting objects an...
متن کاملVideo Question Answering via Hierarchical Spatio-Temporal Attention Networks
Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the lack of modeling the tem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2021
ISSN: ['1873-5142', '0031-3203']
DOI: https://doi.org/10.1016/j.patcog.2020.107677